CERN-TH/2003-088 



Lattice QCD and the Schwarz alternating procedure 



Martin Liischer 

CERN, Theory Division 
CH-1211 Geneva 23, Switzerland 



Abstract 

A numerical simulation algorithm for lattice QCD is described, in which the short- and long- 
distance effects of the sea quarks are treated separately. The algorithm can be regarded, to 
some extent, as an implementation at the quantum level of the classical Schwarz alternating 
procedure for the solution of elliptic partial differential equations. No numerical tests are 
reported here, but theoretical arguments suggest that the algorithm should work well also 
at small quark masses. 



1. Introduction 

The simulation algorithms for (unquenched) lattice QCD that are currently in use 
rapidly become inefficient on large lattices and at small quark masses, where the 
effects of spontaneous chiral symmetry breaking set in. In the case of the HMC [1], 
the PHMC [2,3] and the Multiboson [4] algorithms, the principal technical difficulty 
derives from the fact that the light quark masses have to be scaled proportionally 
to the square of the pion mass in the chiral limit. The lattice Dirac operator is then 
increasingly ill-conditioned, which affects all these algorithms in a similar and rather 
direct way, because they all start from a global pseudo-fermion representation of the 
quark determinant that involves an exact (or an accurate approximate) inversion of 
the Dirac operator. 
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In the present paper a simulation algorithm is proposed that exploits the under- 
lying local structure of the theory and that may be expected to scale in a more 
favourable way in the chiral regime. The general strategy is closely related to the 
alternating procedure that was invented by the mathematician Schwarz in the 19th 
century to establish the existence of the solution of the Dirichlet problem 

A/(x)| xen = 0, f(x)\ xeen = g(x), (1.1) 

on arbitrary bounded domains Q in the plane [5] (for an introduction to the subject 
in the context of discretized partial differential equations see ref. [6], for example). 
Very briefly this method obtains the solution iteratively by dividing f2 into a set of 
overlapping subdomains and by solving the Dirichlet problem on these, in each step 
of the iteration, with boundary values determined from the current approximation 
to the solution. 

The design of simulation algorithms for lattice QCD that operate on overlapping 
blocks of lattice points is non-trivial, however, because the global correctness of the 
simulation must be guaranteed. In principle the problem can be solved using stochas- 
tic acceptance-rejection steps similar to those previously considered by Hasenbusch 
[7] (see also refs. [8-11]). The key question is then whether these correction steps 
can be implemented so that high acceptance rates are achieved, and a significant 
part of the present paper is therefore devoted to this issue. 



2. General form of the algorithm 

In this section the proposed algorithm is described in outline. Most technical details 
are deferred to the later sections and important improvements (such as precondi- 
tioning) are omitted in order to keep the presentation as simple as possible. Some 
algorithm theory, as summarized in appendix A, is nevertheless required to be able 
to understand the procedure. 

2.1 Preliminaries 

Although the algorithm is more generally applicable, only the standard Wilson for- 
mulation [12] of lattice QCD will be considered here (optionally including 0(a) 
improvement [13,14]) with a doublet of mass-degenerate quarks. The lattice spacing 
is set to unity for convenience and the SU(3) link variables are denoted by U(x,fi) 
as usual. 
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After integration over the quark fields, the probability distribution of the gauge 
field reads 




(2.1) 



where Z denotes the partition function, Sq the plaquette action, D w the Wilson- 
Dirac operator and mo the bare quark mass. 

2.2 Alternating procedure 

Following the classical Schwarz procedure, the proposed algorithm visits rectangular 
blocks of lattice points according to some scheme and updates the link variables 
residing there. The blocks can have arbitrary sizes in principle, and their position 
may be chosen randomly, for example, so that all link variables are treated equally. 
However, as will become clear later, the algorithm is designed to perform particularly 
well if the blocks are small in physical units, i.e. if their edges are less than about 
1 fm long. 

On each block A that is visited in the course of this process, changes in the gauge 
field on A are proposed that satisfy detailed balance with respect to the distribution 



A stochastic acceptance-rejection step then needs to be applied to correct for the 
difference between this distribution and the exact distribution (2.1). The operators 
Qa and Qa* that appear here coincide with Q, except that they act on Dirac fields 
defined on the block A and its complement A*, respectively, with Dirichlet boundary 
conditions. At the level of the fermion action, a complete decoupling of the quark 
fields inside and outside the block is thus achieved. In particular, the proposals for 
the link variables residing in A can be generated locally using a block version of the 
HMC algorithm, for example. 

The stochastic acceptance-rejection step, on the other hand, involves an inversion 
of the Dirac operator Q on the full lattice. There is a fair amount of choice in the 
detailed implementation of this step, which can be exploited to reduce the influence 
on the acceptance probability of the link variables far away from the block. Moreover, 
the suggested procedure (which is explained in sect. 5) restricts the pseudo-fermion 
field that needs to be introduced at this stage to the boundary of the block. At least 
to some extent, the local character of the algorithm is thus preserved. 



Pa[U] = ^- e" 5c det(Q A + Qa*) 2 . 



(2.2) 
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Fig. 1. Proposed updates of the link variables on small blocks (dark grey squares) 
can be filtered through acceptance-rejection steps on increasingly larger blocks (light 
grey squares) before the global acceptance-rejection step is applied. 

2.3 Hierarchical structure 

In the form described above the algorithm requires a computational effort per block 
update that increases roughly proportionally to the lattice volume. By introducing 
a hierarchy of blocks such as the one shown in fig. 1, one may, however, be able to 
do better than this. Proposed changes of the link variables on the smallest blocks 
are then obtained as before, and these are taken as proposed configurations on the 
next larger blocks, and so on, accepting or rejecting them so that detailed balance is 
satisfied with respect to the associated distributions -Pa[?7]. Finally a simultaneous 
global acceptance-rejection step is applied to the surviving configurations. 

This procedure is logically correct since non-overlapping blocks are decoupled from 
each other and can be updated in parallel. On the other hand, it will only work out 
in practice if the configurations that have been accepted on the largest blocks have a 
high probability to be accepted on the whole lattice. For the chosen implementation 
of the acceptance-rejection steps, a theoretical argument will be given in sect. 5 that 
explains why this should be so (under certain conditions). 

2.4 Low-mode reweighting 

On physically small blocks A, the Dirac operator Qa that acts on the quark fields in A 
is not expected to become ill-conditioned at small quark masses. The situation here 
is actually similar to the one encountered in studies of the Schrodinger functional, 
where the boundary conditions provide an infrared cutoff on both the gluon and the 
quark modes [15,16]. As a consequence the block simulation algorithm should work 
well even if the bare quark mass itiq is set to the critical mass m c [17]. 

However, since the Wilson formulation of lattice QCD violates chiral symmetry, 
the Dirac operator on large lattices is not protected against the accidental pres- 
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ence of eigenvectors with eigenvalues much smaller than nio — m c . When a gauge 
field configuration is proposed where this happens, it will only rarely pass the global 
acceptance-rejection step. Such configurations are therefore sampled with low statis- 
tics and this can easily lead to uncontrolled statistical fluctuations if a quantity is 
considered that is sensitive to the low modes of the Dirac operator (the propagator 
of the pseudo-scalar density, for example). 

It may be possible to solve this problem by including the product 

n 

w[u] = n Afc ( 2 - 3 ) 

k=i 

of the first few eigenvalues Ai, . . . , A n of Q 2 in the observables and the inverse factor 
in the global acceptance-rejection step. The computational overhead for 
this modification may not be negligible, but it should be noted in this connection 
that the eigenvalues do not need to be computed very accurately (as long as a definite 
procedure is used that obtains the eigenvalues independently of the previous gauge 
field configurations). Moreover the associated approximate eigenvectors can help to 
accelerate the inversion of the Dirac operator that is required in the acceptance- 
rejection step [18]. 



3. Domain decomposition 

Let A be an arbitrary rectangular block of lattice points. A and its complement A* 
(the set of points not in A) define a particular case of a domain decomposition of 
the lattice. In the following paragraphs the aim is to introduce some basic notation 
related to this decomposition, but the terminology is quite general and extends to 
the case where A is replaced by the union of a set of non-overlapping blocks. 

3.1 Boundary points 

On the lattice it is important to distinguish between interior and exterior boundary 
points (see fig. 2). The boundary values for the classical Dirichlet problem on A, 
for example, should be specified on the set <9A of all exterior boundary points if 
the standard nearest-neighbour lattice laplacian is used, while the set of interior 
boundary points plays an analogous role from the point of view of the complementary 
domain A* and is therefore denoted by <9A*. 
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Fig. 2. Two-dimensional slice of a 6 4 block (black and grey points inside the dashed 
square) . The grey and the open points represent the interior and the exterior boundary 
points of the block respectively. 

A special feature of this geometry is that for each exterior boundary point there 
is a unique closest point on the interior boundary, referred to as its partner point. 
The converse is evidently not true, i.e. different exterior boundary points may have 
the same partner point. 

3.2 Decomposition of the Dime operator 

In position space the hermitian lattice Dirac operator is a sparse matrix that assumes 
the block-diagonal form 



if the lattice points are ordered so that those in A come first. The operator Qa, for 
example, acts on Dirac fields on A in the same way as Q, except that all terms in- 
volving the exterior boundary points are set to zero (which is equivalent to imposing 
Dirichlet boundary conditions on <9A). 

It is often convenient to consider Qa, Qa*, Qoa arid Qqa* to be operators in the 
space of Dirac fields that are defined on the whole lattice rather than on A or A* 
only. The embedding is done in the obvious way by padding with zeros. Equation 
(3.1) may then be written in the form 



and an equivalent expression for the determinant in eq. (2.2) is det Q\ det Q\. . This 




(3.1) 



Q = Qa + Qa* + Qoa + Qoa* , 



(3.2) 
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notation is perhaps slightly abusive but should not lead to any confusion since it is 
usually clear from the context which interpretation applies. 

The operator Qqa and its hermitian conjugate Qoa* connect the domains A and 
A* to each other. Explicitly QdA is given by 

3 

QdAip(x) = -75#a(z) - 'y tl )0A*(x + p,)U(x, n)ip(x + fi) 

/i=0 

+ 1(1 + Jh)0a* (x - [i)U(x - A, V(a? - A)}, (3-3) 

where A and 6 a* denote the characteristic functions of A and A* respectively!. This 
operator thus transports the Dirac spinors on the exterior boundary points to the 
partner points on the interior boundary, while QdA* does the same in the opposite 
direction. 

3.3 Dirichlet problem & space of boundary values 

To be able to properly pose the Dirichlet boundary value problem for the lattice 
Dirac operator on A, the space of boundary values first needs to be specified. The 
situation is practically the same as in the case of the Schrodinger functional which 
was studied in ref. [16]. It is useful in this connection to introduce the projector- 
valued function 

3 

dA (x) = J2 M*) {5(1 - ^Wa(x -£) + |(1 + ^)6 A (x + £)} (3.4) 

fi=0 

that will, in many respects, play the role of the characteristic function of the exterior 
boundary. The only non-vanishing terms on the right-hand side of eq. (3.4) are in 
fact those where x is in <9A and x — fi or x + ft its partner point, and the projectors 
2(1 + 7^) are precisely such that the identity 

QdA = QdA&dA (3.5) 
holds. 

The Dirichlet boundary value problem for the lattice Dirac operator is now to find 
a Dirac field tp(x) on A U <9A that satisfies 

Qip(x) = for all x G A, (3.6) 

I The normalization conventions and all unexplained notations are as in ref. [14]. 
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9dA(x)ip(x) = ip(x) = 7](x) for all x € dA, 



(3.7) 



where rj = 9g\rj is any prescribed field on dA. Recalling the decomposition (3.1) of 
the Dirac operator, it is clear that Q can be replaced by Qa + Qqa in eq. (3.6). At 
all points in A the solution is then given by 



where use has been made of the fact that QdA moves the field i] from the exterior to 
the interior boundary of the block. In particular, the solution is uniquely determined 
by the specified boundary values (if Qa is invertible). 



4. Block simulation algorithm 

For the generation of proposed gauge field configurations on a given block A, the 
HMC and PHMC algorithms are both equally suitable. Some relevant details are 
given in this section for the case of the HMC algorithm. The correctness of the local 
form of this algorithm is easily shown by noting that it matches the general pattern 
outlined in subsect. A. 4. 

4-1 Active and spectator link variables 

As already mentioned in sect. 2, the proposed updates of the gauge field on A must 
satisfy detailed balance with respect to the distribution (2.2), which is now written 
in the form 



To achieve a complete decoupling from the surrounding lattice, only the link variables 
residing on a subset of links in A, the active link variables, should be changed. In 
particular, if the links shown in fig. 3 are selected, the factor det Qa* is guaranteed 
to be independent of the active link variables, independently of whether the 0(a)- 
improved or the unimproved theory is considered. 

At this point the HMC algorithm can be set up as usual by introducing a pseudo- 
fermion field <j){x) on A and the canonical momenta H(x, /j.) of the active link vari- 
ables U(x,fi). The total action of the system is then taken to be 



(3.8) 



Pa[U] = e" 5c det Q\ det Q\, . 



(4.1) 



•A 




(4.2) 
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Fig. 3. The block update algorithm only changes the link variables on a subset of 
links in the chosen block (solid lines). In particular, the link variables on the interior 
boundary and those connecting the interior to the exterior boundary are held fixed. 

where the brackets (•, •) denote the natural scalar products in the relevant spaces of 
fields. Note that the factor det Qa* can be ignored, since it does not depend on the 
active link variables. All inactive link variables actually play a spectator role in the 
following, which is rather similar to what happens in the case of the widely used 
simulation algorithms for pure gauge theories, where the link variables are updated 
one after the other. 

4-2 Variable- speed molecular dynamics 

Following the standard procedure, a new configuration of the active link variables is 
now generated by first choosing the pseudo-fermion field and the canonical momenta 
randomly with conditional probability proportional to e~ s . The momenta and the 
active link variables are then evolved from their initial values n, U to some final 
values IT, U' by solving the molecular dynamics equations 



— n(a?,/i) = -j(x,n)F(x,n), 



dr 



U(x,fjt) = j(x,fi)U(x,n)U(x,fi), 



(4.3) 
(4.4) 



from r = to r = 1. As usual the force F(x,fi) takes values in the Lie algebra of 
SU(3) and is determined by the requirement that 



(u,F) 



lim 



1 



U^U e 



U t (x,n)=e^ x ^U(x,ri, 



(4.5) 
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for all variations u(x, fi) of the active link variables. It should be emphasized that the 
computation of the force is an operation that is local to the block A. In particular, 
only the Dirac operator Q\ needs to be inverted. 

The action and the integration measure are both preserved in the course of the 
molecular dynamics evolution, for any choice of the (real) speed factor j(x,fj,). This 
factor is usually set to 1, but for reasons explained later, it may be advisable, in 
the present context, to take a smaller value on the links close to the boundary of A. 
The associated link variables are then slowed down and tend to deviate less from 
their initial values at the end of the evolution (stochastic equations where different 
field components are evolved at different speeds have previously been considered by 
Davies et al. [19]). 

It is, incidentally, straightforward to incorporate this modification in the numerical 
integration scheme that is used to solve the molecular dynamics equations. As in the 
case of the standard HMC algorithm, an acceptance-rejection step is finally required 
to correct for the integration error and thus to obtain a transition probability that 
satisfies detailed balance with respect to the distribution Pa[U]. 



5. Global acceptance rejection step 

The gauge field configurations that are generated by applying the local algorithm 
described above can be taken as proposed configurations for the theory on a larger 
block or on the whole lattice. Whether a proposed field is accepted or not is then 
decided by applying a stochastic acceptance-rejection step. A particular implemen- 
tation of this step is now going to be discussed, first for the case of the transition 
from a given block A to the full lattice. 

5.1 Acceptance probability 

Following the general strategy summarized in subsect. A. 5, the probability distribu- 
tion -Pa[J7] is considered to be an approximation to the exact distribution P[U]. An 
auxiliary pseudo-fermion field x is then added to the system with action 



S x = (x, (Qa + Qa")Q~ 1 q 2 aQ~ 1 (Qa + Qk*)x) , 



(5.1) 



where a local shape function 



0a(x) = 9 9 a(x) + e (1 - Qd\(x)) 



(5.2) 
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with parameter e > 0, has been included for reasons that will become clear in a 
moment. The simultaneous probability distribution of the enlarged theory, 

P[x,U} = le- s -P A [U], (5.3) 
Z 

reproduces the correct distribution when the auxiliary field is integrated out. Note 
that qa is invertible and independent of the fields. The associated determinant is 
therefore a constant that cancels out once the normalization factors are taken into 
account. 

Before the acceptance-rejection step can be applied, a pseudo-fermion field x must 
be generated randomly with conditional probability P[x\U}. This is easily achieved 
by setting 

X=(Qa + Qa*)~ 1 QqI 1 V, (5-4) 

where n is chosen with probability proportional to e - ^'^. The proposed field U' 
generated by the block simulation algorithm is then accepted with probability 

P acc = min{l,e- A5 *} , AS X = S^^, - S x , (5.5) 



which ensures that the gauge field is correctly updated with a transition probability 
that satisfies detailed balance with respect to the exact distribution P[U]. 

5. 2 Elimination of the volume terms 

In the following lines it is shown that AS X only depends on the values of n on the 
exterior boundary <9A, up to terms that are proportional to the parameter e of the 
shape function qa and that can, therefore, be eliminated by taking the limit e — > 0. 
The proof starts from the identities 



AS X = (CC)-( V , V ), (5-6) 

C = QaQ'- 1 (Q'a + Qa.) (Qa + Qa*)" 1 Qq^V 
= V + QAQ'' 1 {Q'a ~ Qa) (Qa + Qa*)" 1 (Qoa + QdA*) Q^V, (5-7) 



where the last line has been obtained using the block decomposition (3.2) of the Dirac 
operator and the fact that Qa* , QdA an d QdA* are independent of the active link 
variables. The crucial observation is then that all terms proportional to QdA*Q~A lr l 
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vanish since Q'^ — Qa acts in the subspace of fermion fields supported on A. Recalling 
eqs. (3.5) and (5.2), this leads to 



C = V + QaQ'- 1 (Q'a - Qa) Qa'Qoav 



(5.8) 



and thus to an expression 



AS x = 2Re {9 aA r,,0 + (£,0 + 0(e), 



(5.9) 



£ = OoaQ 1 ' 1 {Q'a ~ Qa) Qa'Qoav, 



(5.10) 



of the type announced above. 

At this point the parameter e can safely be taken to zero because all terms that are 
inversely proportional to e have disappeared. Moreover, since only the component 
6dAV °f the field r\ enters the final expressions, the constraint 



may now be imposed without loss. The field is thus restricted to the linear space of 
boundary values for the Dirichlet problem on A (cf. subsect. 3.3). 

5.3 Acceptance rate 

Evidently the whole approach will fail in practice unless the acceptance probability 
(5.5) is reasonably large on average. This will obviously be the case if A5 X is only 
rarely greater than 1, and the question thus arises of whether there is any theoretical 
reason to expect this to be so. 

Some insight into the problem can be gained by assuming, for a moment, that U' 
differs from U only on a single link I in A. In eq. (5.9) the first term then involves the 
propagation of the random field rj from the exterior boundary dA to the endpoints 
of I and a second propagation from there back to the boundary. The other term has 
a similar structure but involves altogether four quark propagators. 

Quark propagators typically decay like d~ 3 in the short-distance regime and more 
rapidly at larger distances d. The contributions to AS X with two quark propagators 
are thus suppressed by a factor d~ 3 d'~ 3 at least, where d and d! are the distances of 
the link I from the initial and final points on dA. The sum over all these points then 
still needs to be performed, but since rj is a random field there are strong cancellations 
in this sum and on average the result is enhanced by a factor proportional to the 
number of boundary points only (rather than its square). An important suppression 



V = 8dAV 



(5.11) 
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factor thus remains, particularly when the link I is a fair distance away from the 
boundary. 

Reasonable acceptance rates can thus be expected if the proposed changes in the 
link variables on the links closer to the boundary of the block are not too large. By 
adjusting the speed factor j(x,fi) in the molecular dynamics equations (4. 3), (4.4), 
this can be easily achieved. Note that there must also be important cancellations 
in the sum over the active link variables that is implicit in eq. (5.10), because they 
change in random directions in group space. 

5.4 Numerical exercises 

Some numerical confirmation of this qualitative argumentation is clearly desirable 
at this point. The decay of the quark propagator away from the boundary of the 
block, for example, can be checked by calculating the solution ip = —Q^QdW of 
the Dirichlet problem on A. A representative result of such studies for the average 
of 1^(^)1 over all boundary values and gauge fields is reported in fig. 4. 

Another check on the correctness of the theoretical argumentation is obtained by 
computing the average of the acceptance probability P acc in the quenched approxi- 
mation, where the proposed changes in the active link variables are generated using 
a link update algorithm. The results in this case are rather encouraging too and 
they demonstrate, in particular, the importance of the elimination of the volume 
terms. 

In all these studies the dependence of the calculated quantities on the quark mass 
appears to be fairly weak, as long as the edges of the block are smaller than 1 fm or 
so. This is perhaps not totally surprising in view of what has been said above, but 
the observation suggests that the algorithm will remain effective also at small quark 
masses. 

5. 5 Hierarchical filter 

As was briefly discussed in subsect. 2.3, a hierarchical structure should be adopted 
on large lattices where the proposed gauge field configurations are filtered through a 
sequence of blocks of increasing size. The filter involves an acceptance-rejection step 
for each transition from a block A to the next larger block £1. Equations (5.9)— (5.11) 
remain valid in this case if Q' is replaced by Q' n . An important point to note here is 
that the distance of the links where the active link variables reside to the boundary 
of the blocks is increasing. The acceptance probability for the step from one block 
to the next is therefore expected to rise quickly. 

The proposed configurations that remain after the application of the hierarchical 
filter must finally be submitted to a global acceptance-rejection step. If several 
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Fig. 4. Average magnitude squared of the solution ip(x) of the Dirichlet problem 
(3. 6), (3. 7) on a 12 4 block as a function of the distance d from the exterior boundary 
of the block. The data shown are from a simulation of quenched QCD at a lattice 
spacing of about 0.1 fm. Diamonds, squares and circles correspond to values of the 
quark mass where the pion mass is known to be approximately equal to 320, 457 and 
579 MeV respectively [20]. 



configurations are submitted simultaneously, as suggested in subsect. 2.3, the accep- 
tance probability is again given by eqs. (5.5) and (5.9)-(5.11), where A now stands 
for the union of the big blocks that contain the proposed configurations. The bound- 
ary dA is then the union of the boundaries of these blocks and Oqj^ the sum of the 
corresponding characteristic functions. In particular, there are terms in AS X that 
couple the random field r\ on the boundaries of different blocks. 



6. Miscellaneous remarks 



(1) Preconditioning. As is generally the case in lattice QCD, the preconditioning of 
the Dirac operator can be expected to have a significant impact on the performance 
of the proposed algorithm. In particular, it is straightforward to implement even-odd 
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preconditioning both in the block simulation algorithm and in the global acceptance- 
rejection step. 

(2) Block simulation algorithm. There is a range of algorithms that can be used to 
generate the proposed gauge field configurations on a given block. In particular, most 
improvements of the HMC and the PHMC algorithms that have been introduced 
over the years, such as the two-boson method [21,22], should be beneficial here too. 

A more radical possibility is to generate the configurations through a standard 
link update algorithm, using an adapted gauge field action, and to apply a stochastic 
acceptance-rejection step to correct for the quark determinant [7-11]. Following the 
lines of subsect. 5.2, the random pseudo-fermion field that needs to be introduced 
in the correction step can be reduced to the support of Q' A — Q\ in this case. This 
will no doubt increase the average acceptance probability, but whether the approach 
can compete with the block HMC algorithm, for example, remains to be seen. 

(3) Parallel processing. The proposed algorithm is well suited for parallel comput- 
ers, because distant blocks can be updated independently of each other. Different 
distributions of the workload are conceivable, and it is not required that the small- 
est blocks be processed on single nodes, although in this case the communication 
overhead is minimized. It is, incidentally, advisable to translate the gauge field after 
each cycle rather than shifting the block positions. The program then always oper- 
ates on the same blocks and their position can be chosen so as to fit the processor 
grid in the best possible way. 



7. Conclusions 

The problem to find efficient simulation algorithms for unquenched lattice QCD in 
the chiral regime has been around for many years. Whether the approach described 
in this paper provides a viable solution is not certain and can only be decided after 
extensive numerical tests have been performed. 

It is quite clear, however, that significant progress in this area can only be made 
if the short- and long-distance effects of the sea quarks are treated differently. This 
insight is not new and has motivated the truncated determinant approximation of 
Duncan et al. [23,24], for example, and a modification of the HMC algorithm with 
multiple molecular dynamics time scales [25]. To some extent, the two-boson method 
studied in refs. [21,22] can also be put under this general heading. 
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The algorithm proposed here separates the short-distance effects from the long- 
distance ones by updating the gauge field on physically small blocks of lattice points 
that are visited sequentially. On the blocks the theory can be simulated even at 
vanishing quark masses, because the chosen boundary conditions imply an infrared 
cutoff on the spectrum of the Dirac operator. The long-distance effects of the sea 
quarks are then incorporated by a sequence of acceptance-rejection steps that lead 
from the small blocks to larger blocks and eventually to the full lattice. 

I wish to thank Martin Hasenbusch and Ulli Wolff for correspondence on fermion 
simulation algorithms and Peter Weisz for a critical reading of the manuscript. The 
computer time for the numerical studies reported in sect. 5 has been kindly provided 
by DESY-Hamburg and by the Institute for Theoretical Physics at the University 
of Berne. 



Appendix A. Transition probabilities 

To prove the correctness of the algorithm proposed in this paper it suffices to write 
down the transition probabilities that are associated to the various algorithmic steps 
and to recall some basic facts from the general theory of numerical simulations. For 
completeness, the relevant theoretical results are summarized in this appendix. 

A.l Convergence theorem 

To avoid any unnecessary complications, an abstract discrete system will be consid- 
ered with a finite number of states s. The thermal equilibrium distribution of the 
states is denoted by P(s) and it is assumed that P(s) > for all s. 

Numerical simulation algorithms for this system (as far as they follow the standard 
procedures) are defined by a transition probability T(s — > s') that satisfies 

1. T(s — ► s') is non-negative and such that J2 S ' T{s — > s') = 1 for all s. 

2. The equilibrium distribution is preserved, i.e. for all states s' the equation 
£ s P(s)T(s -» s') = P(s') holds. 

3. The recurrence probability T(s — > s) is non-zero for all s and any state can 
be reached from any other state in a finite number of transitions. 

In practice the simulation starts from an initial state sq and generates a sequence 
of states si,S2,... recursively with transition probability T(sk — ► Sfc+i)- The fun- 
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damental theorem of simulation theory then asserts that the states in the sequence 
will be asymptotically distributed according to the equilibrium distribution. 



A. 2 Composition rule & detailed balance 

If Ti and T 2 are any two transition probabilities that satisfy conditions 1-3, it is 
trivial to show that the same is true for the composed transition probability 

T(s^ s ') = Y J T l {s^r)T 2 {r ^ s'). (A.l) 

r 

The associated algorithm generates an intermediate state r with probability Xi (s — > 
r) and then the new state s' with probability T 2 (r — > s'). Evidently any number of 
algorithms can be combined in this way. 

Transition amplitudes that satisfy detailed balance, 

P{s)T(s -> s') = P(s')T(s' -> s), (A.2) 

are an important special case. If detailed balance holds, property 2 is an immediate 
consequence of property 1. However, the composed transition probability (A.l) in 
general does not satisfy detailed balance if Tl and T 2 do, because the order of the 
factors in eq. (A.l) matters. It is possible to correct for this deficit by choosing the 
order randomly or by forming reversion-symmetric products. 

A. 3 Acceptance-rejection method 

Valid algorithms may often be obtained from transition amplitudes T (s — > s') that 
satisfy detailed balance with respect to some approximate distribution Pq(s) [26]. 
The idea is to first propose a new state s' according to this probability and to accept 
it with probability 

Otherwise (i.e. when s' is rejected) the old state s is taken to be the new one. 
The transition amplitude that is associated to this algorithm, 

T(s -> a') =T {s^ s >)P acc ( S}S ') + 5 ss , - 5 SS , ^T (s -► r)P acc {s,r), {A A) 

r 

satisfies detailed balance and also conditions 1 and 3. It must be stressed, however, 
that for this statement to be true it is essential that T (s — > s') satisfies detailed 
balance with respect to the distribution Pq(s) and not only condition 2. 
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A. 4 Using auxiliary stochastic variables 

When the equilibrium probability distribution is too complicated to be treated di- 
rectly, an equivalent but more accessible system may sometimes be found that in- 
volves auxiliary stochastic variables. In the case of the HMC algorithm, for example, 
the auxiliary variables are the pseudo-fermion field and the momenta of the link vari- 
ables [1]. A simulation algorithm may then be set up for the enlarged system, and 
this leads to a correct algorithm for the original system under certain conditions. 

In abstract terms the states of the enlarged system are labelled by pairs v,s, where 
v and s represent the auxiliary and the basic variables respectively. The equivalence 
to the original system is then expressed through the identity 

P(s) = J2Hv,s) (A.5) 

V 

in which P(v, s) denotes the equilibrium distribution of the enlarged system. In the 
following the conditional probability 

P(v\s) = P(v,s)/P(s) (A.6) 



to find v given s will play an important role. 

Now if T(v, s — > v ', s') is any given transition probability for the enlarged system 
that satisfies conditions 1 and 2 (but not necessarily condition 3), an update algo- 
rithm for the original system is obtained as follows. Starting from an initial state 
s, the auxiliary variables v are first chosen randomly with conditional probability 
P(v\s). After that, a state v',s' is generated with probability T(v,s — > v',s') and 
s' is taken to be the new state of the original system. It is straightforward to show 
that the associated transition probability 

T(s ^s') = Y^ P(v\s)f(v, s -► v', s') (A.7) 

v,v' 

satisfies conditions 1 and 2. Moreover, detailed balance holds if the transition prob- 
ability T(v,s — ► v',s') has this property relative to the equilibrium distribution of 
the enlarged system. Whether condition 3 is fulfilled depends on both P(v\s) and 
T(v, s — > v', s') and has to be verified in each case. 

A.5 Stochastic acceptance-rejection method 

The starting point here is again an algorithm with transition probability Tq(s — > s') 
that satisfies detailed balance with respect to some approximate distribution Pq(s). 
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An enlarged system is then considered, exactly as above, but the auxiliary variables 
are now only used in the acceptance-rejection step where (A. 3) is replaced by 



Pacc( S , s') = £ P(v\s) min ( 1, 1 • (A.8) 

V I P(v,s)P (s') J 

This means that once a new state s' of the original system is proposed, a random 
choice of the auxiliary variables v first needs to be made, with conditional probability 
P(v\s), before it can be decided whether s' should be accepted or not. 
The correctness of this method follows from the identity 

[P(s)/P Q (s)] P acc (s, s') = [P(s')/P (s')} P acc (s', s) (A.9) 

that is easily derived from the definition (A.8) and the properties of the distributions 
Po(s) and P(v,s). In particular, the associated transition probability (A. 4) satisfies 
detailed balance. It has recently been noted, however, that [11] 

P„, s ,^,™jEp(»i.),E^) 

which shows that the stochastic acceptance-rejection method tends to be less effi- 
cient than the non-stochastic method. 
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